Goto

Collaborating Authors

 Champaign County


PAC-Bayes Bounds for Gibbs Posteriors via Singular Learning Theory

Wang, Chenyang, Yang, Yun

arXiv.org Machine Learning

We derive explicit non-asymptotic PAC-Bayes generalization bounds for Gibbs posteriors, that is, data-dependent distributions over model parameters obtained by exponentially tilting a prior with the empirical risk. Unlike classical worst-case complexity bounds based on uniform laws of large numbers, which require explicit control of the model space in terms of metric entropy (integrals), our analysis yields posterior-averaged risk bounds that can be applied to overparameterized models and adapt to the data structure and the intrinsic model complexity. The bound involves a marginal-type integral over the parameter space, which we analyze using tools from singular learning theory to obtain explicit and practically meaningful characterizations of the posterior risk. Applications to low-rank matrix completion and ReLU neural network regression and classification show that the resulting bounds are analytically tractable and substantially tighter than classical complexity-based bounds. Our results highlight the potential of PAC-Bayes analysis for precise finite-sample generalization guarantees in modern overparameterized and singular models.


Two-Sided Bounds for Entropic Optimal Transport via a Rate-Distortion Integral

Liu, Jingbo

arXiv.org Machine Learning

We show that the maximum expected inner product between a random vector and the standard normal vector over all couplings subject to a mutual information constraint or regularization is equivalent to a truncated integral involving the rate-distortion function, up to universal multiplicative constants. The proof is based on a lifting technique, which constructs a Gaussian process indexed by a random subset of the type class of the probability distribution involved in the information-theoretic inequality, and then applying a form of the majorizing measure theorem.


Starting Off on the Wrong Foot: Pitfalls in Data Preparation

Guo, Jiayi, Dong, Panyi, Quan, Zhiyu

arXiv.org Machine Learning

When working with real-world insurance data, practitioners often encounter challenges during the data preparation stage that can undermine the statistical validity and reliability of downstream modeling. This study illustrates that conventional data preparation procedures such as random train-test partitioning, often yield unreliable and unstable results when confronted with highly imbalanced insurance loss data. To mitigate these limitations, we propose a novel data preparation framework leveraging two recent statistical advancements: support points for representative data splitting to ensure distributional consistency across partitions, and the Chatterjee correlation coefficient for initial, non-parametric feature screening to capture feature relevance and dependence structure. We further integrate these theoretical advances into a unified, efficient framework that also incorporates missing-data handling, and embed this framework within our custom InsurAutoML pipeline. The performance of the proposed approach is evaluated using both simulated datasets and datasets often cited in the academic literature. Our findings definitively demonstrate that incorporating statistically rigorous data preparation methods not only significantly enhances model robustness and interpretability but also substantially reduces computational resource requirements across diverse insurance loss modeling tasks. This work provides a crucial methodological upgrade for achieving reliable results in high stakes insurance applications.


One giant leap for planetary defence: NASA successfully changed an asteroid's orbit around the SUN, new study reveals

Daily Mail - Science & tech

Horrifying next twist in the Alexander brothers case: MAUREEN CALLAHAN exposes an unthinkable perversion that's been hiding in plain sight Hollywood icon who starred in Psycho after Hitchcock dubbed her'my new Grace Kelly' looks incredible at 95 Alexander brothers' alleged HIGH SCHOOL gang rape video: Classmates speak out on sick'taking turns' footage... as creepy unseen photos are exposed Model Cindy Crawford, 60, mocked for her'out of touch' morning routine: 'Nothing about this is normal' Kentucky mother and daughter turn down $26.5MILLION to sell their farms to secretive tech giant that wants to build data center there Tucker Carlson erupts at Trump adviser as she hurls'SLANDER' claim linking him to synagogue shooting NFL superstar Xavier Worthy spills all on Travis Kelce, the Chiefs' struggles... and having Taylor Swift as his No 1 fan Heartbreaking video shows very elderly DoorDash driver shuffle down customer's driveway with coffee order because he is too poor to retire Amber Valletta, 52, was a '90s Vogue model who made movies with Sandra Bullock and Kate Hudson, see her now Nancy Mace throws herself into Iran warzone as she goes rogue on Middle East rescue mission: 'I AM that person' One giant leap for planetary defence: NASA successfully changed an asteroid's orbit around the SUN, new study reveals Humanity has taken a'notable step forward' in its ability to deflect asteroids heading towards Earth, a new study reveals. Back in 2022, NASA deliberately smashed a spacecraft into a small asteroid'moonlet' that orbited a larger space rock. The probe, called Dart, successfully changed the path of the moonlet, called Dimorphos, around its parent asteroid, Didymos. The mission was hailed as the first-ever successful demonstration of planetary defence, proving humanity can alter an asteroid's trajectory. But now, scientists have revealed the test also knocked both asteroids off their regular orbit around the Sun.



In search of the next generation of multimodal datasets

Neural Information Processing Systems

While these advances use different algorithmic techniques, e.g., contrastive learning, diffusion, or auto-regressive modeling, they all rest on a common foundation: large datasets containing paired image-text examples.





Backdoor Attacks on Multivariate Time Series Forecasting

Neural Information Processing Systems

Unlike traditional backdoor attacks that focus on specific class labels, our approach aims to induce poisoned models to predict future data as a predefined target pattern.